A Unified Approach to Active Dual Supervision for Labeling Features and Examples
نویسندگان
چکیده
When faced with the task of building accurate classifiers, active learning is often a beneficial tool for minimizing the requisite costs of human annotation. Traditional active learning schemes query a human for labels on intelligently chosen examples. However, human effort can also be expended in collecting alternative forms of annotation. For example, one may attempt to learn a text classifier by labeling words associated with a class, instead of, or in addition to, documents. Learning from two different kinds of supervision adds a challenging dimension to the problem of active learning. In this paper, we present a unified approach to such active dual supervision: determining which feature or example a classifier is most likely to benefit from having labeled. Empirical results confirm that appropriately querying for both example and feature labels significantly reduces overall human effort—beyond what is possible through traditional one-dimensional active learning.
منابع مشابه
A Non-negative Matrix Factorization Based Approach for Active Dual Supervision from Document and Word Labels
In active dual supervision, not only informative examples but also features are selected for labeling to build a high quality classifier with low cost. However, how to measure the informativeness for both examples and feature on the same scale has not been well solved. In this paper, we propose a non-negative matrix factorization based approach to address this issue. We first extend the matrix ...
متن کاملActive Dual Supervision: Reducing the Cost of Annotating Examples and Features
When faced with the task of building machine learning or NLP models, it is often worthwhile to turn to active learning to obtain human annotations at minimal costs. Traditional active learning schemes query a human for labels of intelligently chosen examples. However, human effort can also be expended in collecting alternative forms of annotations. For example, one may attempt to learn a text c...
متن کاملDocument Clustering with Dual Supervision
Nowadays, academic researchers maintain a personal library of papers, which they would like to organize based on their needs, e.g., research, projects, or courseware. Clustering techniques are often employed to achieve this goal by grouping the document collection into different topics. Unsupervised clustering does not require any user effort but only produces one universal output with which us...
متن کاملGuided Feature Labeling for Budget-Sensitive Learning Under Extreme Class Imbalance
Extreme class skew is a hurdle in many machine learning tasks. In such skewed settings, traditional methods for procuring labeled examples, including random sampling and active learning, are often ineffective— they struggle to find representative minority examples. The framework of Dual Supervision, which incorporates feature-based background information into traditional supervised learning, pr...
متن کاملA UNIFIED MODEL FOR RESOURCE-CONSTRAINED PROJECT SCHEDULING PROBLEM WITH UNCERTAIN ACTIVITY DURATIONS
In this paper we present a unified (probabilistic/possibilistic) model for resource-constrained project scheduling problem (RCPSP) with uncertain activity durations and a concept of a heuristic approach connected to the theoretical model. It is shown that the uncertainty management can be built into any heuristic algorithm developed to solve RCPSP with deterministic activity durations. The esse...
متن کامل